EDUCATION AND HUMAN DEVELOPMENT

A Study into the effects of education on advancing the development of countries from 1980 to 2010

Report by Kai Canoll

Introduction

Perhaps the most important development of the past century has been the spread of universal education worldwide. For example, the literacy rate worldwide has doubled in the past 65 years, from 42% in 1960 to 86% in 2015. This exponentially growth in literacy signals an astounding shift in the availibility of education in parts of the world that normally have not had any access to it. Clearly, though, some countries have not had the resources or the commitment to educate as much of their peoples as others.

The question explored here is whether the countries that committed more to furthering their peoples education had a tangible effect on their countries social and economic development. The best marker that exists for measuring this development is the Human Development Index (HDI). With this paper, we will show which countries and regions struggled most with promoting education and or development, whether the two factors are connected and where they are most connected, and what types of education signaled the greatest leaps in development for different countries.

Package and Data Importing

Below are the necessary python packages to run the data:

In [1]:
#Package Importation
import matplotlib.pyplot as plt

import plotly.express as px
import pandas as pd
import numpy as np
import matplotlib
import cufflinks as cf
import plotly
import plotly.offline as py
import plotly.graph_objs as go
import random 
import plotly.figure_factory as ff

Next, the education and development data was imported using pandas, then merged on country name and year for congeniality. Some country names were changed within the csv files to make them easier to join on. The education data was derived from the Barro-Lee Educational Attainment dataset, while the HDI information came from ARC GIS.

In [2]:
#Dataframe Importation from CSVs
Education_csv = pd.read_csv('Education.csv')

#print(Education_csv.sort_values(by=["country"])["country"].unique())

HDI_csv = pd.read_csv('Human_Development_Index_by_country,_2013.csv')

#Joining code
HDI_csv["country"] = HDI_csv["sovereignt"] #Two column names to be changed
HDI_csv["year"] = HDI_csv["YEAR"] 

#print(HDI_csv.sort_values(by=["country"])["country"].unique())
#Merge on country and year
Education_HDI_csv = pd.merge(Education_csv, HDI_csv, on=['country','year']) #Common value of choice

Some age values and countries were deemed unnecessary for analysis, such as Antartica, so they were removed from the dataframe.

In [3]:
#Get Rid of Antartica and generalized age columns
Education_HDI_csv = Education_HDI_csv[Education_HDI_csv["ageto"] !=999]

Education_HDI_csv = Education_HDI_csv[Education_HDI_csv["region_wb"] !='Antarctica']

The columns were then renamed in order to be more descriptive than their original shorted monikers.

In [4]:
#Renaming Columns
Education_HDI_csv_renamed= Education_HDI_csv.rename(columns={"BLcode"	: "Barro-Lee Country Code"
,"WBcode"	: "World Bank Country Code"
,"region_code"	: "Region Code"
,"country"	: "Country Name"
,"year"	: "Year"
,"sex"	: "Sex"
,"agefrom"	: "Starting Age"
,"ageto"	: "Finishing Age"
,"lu"	: "Percentage of No Schooling Attained in Pop."
,"lp"	: "Percentage of Primary Schooling Attained in Pop."
,"lpc"	: "Percentage of Complete Primary Schooling Attained in Pop."
,"ls"	: "Percentage of Secondary Schooling Attained in Pop."
,"lsc"	: "Percentage of Complete Secondary Schooling Attained in Pop."
,"lh"	: "Percentage of Tertiary Schooling Attained in Pop."
,"lhc"	: "Percentage of Complete Tertiary Schooling Attained in Pop."
,"yr_sch"	: "Average Years of Schooling Attained"
,"yr_sch_pri"	: "Average Years of Primary Schooling Attained"
,"yr_sch_sec"	: "Average Years of Secondary Schooling Attained"
,"yr_sch_ter"	: "Average Years of Tertirary Schooling Attained"
,"pop"	: "Population"
,"pop15"	: "Total Population over 15"
,"pop25"	: "Total Population over 25"})

cf.go_offline() # required to use plotly offline (no account required).
py.init_notebook_mode() # graphs charts inline (IPython).

#Drop columns with any null values
Education_HDI_csv_renamed=Education_HDI_csv_renamed.dropna()

#Set new column name
Education_HDI_csv_renamed["Children Born in Year"] = Education_HDI_csv_renamed["Year"] - Education_HDI_csv_renamed["Starting Age"] 

Function Creation

Using the plotly and pandas packages primarily, interactive graphing functions and dataframe filters were created to help present the data.

In [5]:
###Function Declaration

#Give each value a constant color in plotly
def constant_color_assigner(column):
    COLOR_DICT={}
    col_uniques = column.unique()
    col_uniques.sort() #Get constant order for these values

    if len(col_uniques) > 100:
        return COLOR_DICT
    
    for val in range(len(col_uniques)): #Iterate through list getting unique index each time
        COLOR_DICT[col_uniques[val]] = color_list_long[val]

    return COLOR_DICT

#Lineplot creator for 2-3 factors within a Dataframe.
def Group_by_two_factors_df_lineplot_maker(Fact1,text_hover,Value,df,titler="",slider_text=""):

    yaxiser=dict( range=[ min(df[Value]), max(df[Value])])
    if slider_text=="":
        #Group by the two factors and make sure its a mean
        df=df.groupby([Fact1,text_hover],  as_index=False).agg({Value:"mean"})
        df= df.groupby([Fact1,text_hover], as_index=False).mean()
    else: 
        #Group by the three factors and make sure its a mean
        df=df.groupby([Fact1,text_hover,slider_text],  as_index=False).agg({Value:"mean"})
        df= df.groupby([Fact1,text_hover,slider_text], as_index=False).mean()
    
    #Get rid of index value and randomize order
    df = df.sample(frac=1).reset_index(drop=True)
    
    #Sort the dataframe by the categorical value
    df = df.sort_values(by=[Fact1])
    
    #If there is no slider text, don't add in the function
    if slider_text == "":
        fig = px.line(df, x=Fact1, color=text_hover
                      , color_discrete_map = constant_color_assigner(df[text_hover]),
                 y=Value,
                 title=titler)
    else:
        #Line Plot code that creates color and slider toggler, sorts by the slider, adds title
        fig = px.line(df, x=Fact1, color=text_hover
                      , animation_frame=slider_text,
                      category_orders={slider_text: list(df[slider_text].sort_values().unique())}
                      ,color_discrete_map = constant_color_assigner(df[text_hover])
                      ,y=Value
                      ,title=titler)
    
    #Set size and color of Figure
    fig.update_layout(
    margin=dict(l=80, r=80, t=100, b=80), 
    paper_bgcolor="LightSteelBlue",yaxis=yaxiser,)
    
    #Display figure
    fig.show()

#Create a unique list of values for the column and sort it by a sort column possibly    
def constant_unique(df,main_column,sort_by_column=""):
    if sort_by_column=="": #If there is no sort column
        date_order = df[main_column].unique()
    else:
        date_order = df.sort_values(by=[sort_by_column])[main_column].unique()
    return date_order


#Creates barplot at the from the dataframe with and without sliders
def Plotly_Bar_Plot(Fact1,Value,df,text_hover="",slider_text="",Color_Titler=""
                    ,titler=""
                    ,category_order=False,Sort_fact=""):
    
    yaxiser=dict( range=[ min(df[Value]), max(df[Value])]) #Set value limits at both ends
    
    if category_order == True: #If you want there to be a certain order to the data.
        if Sort_fact == "":
                 category_order={'categoryorder':'array',
                                 'categoryarray':constant_unique(df,Fact1)}
        else:
                category_order={'categoryorder':'array','categoryarray':constant_unique(df,Fact1,Sort_fact)}
    else:
        category_order={'categoryorder':'total ascending'}
    #df = df.sort_values(by=[slider_text])
    if Color_Titler == "" and slider_text == "":
        fig = px.bar(df, x=df[Fact1], y=df[Value], color=df[Fact1],  color_discrete_map = constant_color_assigner(df[Fact1]), hover_data=[df[text_hover]])

    elif slider_text != "":
        df = df.sort_values(by=[Value])
        fig = px.bar(df, x=df[Fact1], y=df[Value], color=df[Fact1]
                    , animation_frame=slider_text
                    , color_discrete_map = constant_color_assigner(df[Fact1])
                    ,category_orders={slider_text: list(df[slider_text].sort_values().unique())}, hover_data=[df[text_hover]])


    else:
        fig = px.bar(df, x=df[Fact1], y=df[Value], color=df[Color_Titler], color_discrete_map = constant_color_assigner(df[Color_Titler]), hover_data=[df[text_hover]])

    fig.update_layout(title=titler   , xaxis_title=Fact1,   xaxis=category_order,yaxis= yaxiser)
    fig.update_layout( margin=dict(l=80, r=80, t=100, b=80),paper_bgcolor="LightSteelBlue",)
    
    fig.show()        
    
def Plotly_Hist_Plot(Fact1,df,text_hover="",Color_Titler="",titler=""):

    fig = px.bar(df, x=df[Fact1], color=df[Color_Titler], color_discrete_map = constant_color_assigner(df[Color_Titler]) ,  hover_data=[df[text_hover]])

    fig.update_layout(title=titler   , xaxis_title=Fact1)
    fig.update_layout(
    margin=dict(l=80, r=80, t=100, b=80),
    paper_bgcolor="LightSteelBlue",
        )
    
    fig.show()
        


    
def Plotly_Hist_Plot_Groupby_Country(Fact1,Value,text_hover="",Color_Titler="",titler=""
                                     ,df=Education_HDI_csv_renamed,slider_text=""):
    if slider_text=="":
        df=df.groupby([Fact1,text_hover],  as_index=False).agg({Value:"mean"})
    else:
        df=df.groupby([Fact1,text_hover,slider_text],  as_index=False).agg({Value:"mean"})
    df = df.sample(frac=1).reset_index(drop=True)
    Plotly_Bar_Plot(Fact1,df,Value,text_hover,Color_Titler,titler,slider_text)
    
#Pie Plot creator
def Plotly_Pie_Plot_1fact(Fact1,Value,df,text_hover="",Color_Titler="",titler=""):
    fig = px.pie(df, values='pop', names='country', title='Population of European continent')
    fig.show()
    
def Plotly_Hist_Plot_1fact(Fact1,Value,text_hover="",Color_Titler="",titler="",df=Education_HDI_csv_renamed):
    if text_hover=="": #If there is no text column for the cursor to hover over, make it first factor
        text_hover=Fact1
    
    #Average and reset index of DF
    df=df.groupby([Fact1],  as_index=False).agg({Value:"mean"})
    df = df.sample(frac=1).reset_index(drop=True)
    
    if color_dict != {}: #If there is no color dictionary provided, randomize the color map for the barplot
                 fig = px.bar(df, x=Fact1, color=text_hover, color_discrete_map = constant_color_assigner(df[text_hover]),y=Value,title=titler).update_xaxes(categoryorder="total ascending") #Ascending bars

                 fig.show()    
    else:
        fig = px.bar(df, x=Fact1, color=text_hover,y=Value,title=titler).update_xaxes(categoryorder="total ascending")

        fig.show()    
    
#Creates a dataframe with specific inputted values for the  Age, Year, and Region
def Dataframe_Loc(df,Age_Val= 0 ,Year_Val=0,Region_Val=""):
    if Age_Val != 0: #If value enterred as something usable
        df = df.loc[df['Starting Age'] == Age_Val] #Filter datafrane to only those values
    if Year_Val != 0: #If value enterred as something usable
        df = df.loc[df['Year'] == Year_Val]#Filter datafrane to only those values
    if Region_Val != "": #If value enterred as something usable
        df = df.loc[df['region_wb'] == Region_Val]#Filter datafrane to only those values
        
    return df


#Create Difference by Difference Plot to show which coutries improved the most
def Difference_Between_Two_Factors_2010_1980_Plot_Maker(Fact_1,Fact_2,df,titler,return_dataframe=True):
    

    df = df.sample(frac=1).reset_index(drop=True)
    df = df.sort_values(by=[Fact_1])
    df = df.sort_values(by=['Starting Age'])

    Education_HDI_csv_renamed_Age_15_Year_1980 = Dataframe_Loc(df,Age_Val= 0
    , Year_Val=1980)
    Education_HDI_csv_renamed_Age_15_Year_2010 = Dataframe_Loc(df,Age_Val= 0
    , Year_Val=2010)
    

    Education_HDI_csv_joined = pd.merge(Education_HDI_csv_renamed_Age_15_Year_2010,
                             Education_HDI_csv_renamed_Age_15_Year_1980,
                             on=['formal_en','Country Name','Starting Age'],suffixes=('', '_y'))

    
    Education_HDI_csv_renamed_Age_15_Year_1980_minus_2010  = pd.DataFrame(
        {'formal_en': Education_HDI_csv_joined['formal_en'],
         'Country Name': Education_HDI_csv_joined['Country Name'],
        Fact_1+'_Diff': Education_HDI_csv_joined[Fact_1]
        -Education_HDI_csv_joined[Fact_1+"_y"],
        Fact_2+ '_Diff': Education_HDI_csv_joined[Fact_2]
        -Education_HDI_csv_joined[Fact_2+"_y"]
        ,'region_wb': Education_HDI_csv_joined['region_wb'],
        'Starting Age': Education_HDI_csv_joined['Starting Age']})
         
         
    Education_HDI_csv_renamed_Age_15_Year_1980_minus_2010['Starting Age']= Education_HDI_csv_renamed_Age_15_Year_1980_minus_2010['Starting Age'].astype(str)+" : " +(2010 - Education_HDI_csv_renamed_Age_15_Year_1980_minus_2010['Starting Age']).astype(str)+"-" +(1980 - Education_HDI_csv_renamed_Age_15_Year_1980_minus_2010['Starting Age']).astype(str)

    #Education_HDI_csv_renamed_Age_15_Year_1980_minus_2010 =Education_HDI_csv_renamed_Age_15_Year_1980_minus_2010.dropna()
    if return_dataframe == False:
        Plotly_Scatter_Plot( Fact_1+'_Diff', Fact_2+ '_Diff'
                             ,Education_HDI_csv_renamed_Age_15_Year_1980_minus_2010,text_hover="formal_en",
                                Color_Titler="region_wb", titler= titler,slider_text='Starting Age')
    else:
        return Education_HDI_csv_renamed_Age_15_Year_1980_minus_2010

        """Plotly_Bar_Plot(Fact1,df,Value,text_hover="",slider_text="",Color_Titler="",titler=""
                    ,category_order=False,Sort_fact="")"""
        #Allows dataframe to be entered as a 3 dimensional heatmap
def df_to_plotly(df):
    return {'z': df.values.tolist(),'x': df.columns.tolist(),'y': df.index.tolist()}

#Creates a seaborn heatmap of the data by making a pivot table
def heatmap_Table(values_string,index_string,column_string,plottitle,agg_fun,df):
    fig = plt.figure(figsize=(15,15))
    piv = pd.pivot_table(df, values=values_string,index=[index_string],
    columns=[column_string], fill_value=0,aggfunc=agg_fun)
    #plot pivot table as heatmap using seaborn
    ax = sns.heatmap(piv, square=True,annot=True, fmt='g')
    plt.setp( ax.xaxis.get_majorticklabels(), rotation=90 )
    plt.title(plottitle)
    plt.tight_layout()
    plt.show()
    

    

"""Proportions of instances of each column"""
def proportional_instances_of_column_series(column_name,df,Normalizer_string_toggle = "mean"):
    if Normalizer_string_toggle == "mean":
        normalizer = True
    else:
        normalizer = False
    
    df = df.sort_values([column_name]).reset_index(drop=True)
    #Index_of_gender = SMORE_DF.column_name.unique()
    if Normalizer_string_toggle != 'mean':
        value_counts_obj = df[column_name].value_counts(normalize=normalizer,sort=True)
    else:
        value_counts_obj = df[column_name].value_counts(normalize=normalizer,sort=True).mul(100).round(1)
    value_counts_obj = value_counts_obj.sort_index()
    try:
        value_counts_obj = value_counts_obj.groupby(value_counts_obj.index // 1).sum()
    except:
        pass
    return pd.DataFrame(value_counts_obj)
    
  

def barplot_percentage_by_unique_group(column_name,value_name,SMORE_DF,Title_fig="",ylimer=[0,0]):
    
    COlumn_vals= SMORE_DF[column_name].unique()
    fig, axs = plt.subplots(ncols=1, nrows=int(len(COlumn_vals)))
    fig.set_size_inches(10, 10)
    fig.subplots_adjust(wspace=0.2)
    fig.subplots_adjust(hspace=0.5)
    for col, ax in zip(COlumn_vals, axs.flatten()):
        SMORE_DF_male_fem = SMORE_DF.loc[SMORE_DF[column_name].isin([col])]
        #SMORE_DF = SMORE_DF.sort_values([value_name]).reset_index(drop=True)
        #Index_of_gender = SMORE_DF.column_name.unique()
        value_counts_obj = SMORE_DF_male_fem[value_name].value_counts(normalize=normalizer,sort=False)
        value_counts_obj = value_counts_obj.sort_index()
        try:
            value_counts_obj = value_counts_obj.groupby(value_counts_obj.index).sum()
        except:
            pass
        ax.bar(value_counts_obj.index,list(value_counts_obj))
        #plt.xticks(range(0,len(Index_of_gender)-1),Index_of_gender)
        #ax.ylabel("Percentage",fontsize=20)
        #ax.xlabel(column_name,fontsize=20)
        #ax.title(col,fontsize=30)
        ax.set_title(col)
        if ylimer!= [0,0]:
            ax.set_ylim(ylimer)
        ax.tick_params(axis='x', labelrotation=90)
    fig.suptitle(Title_fig)
    plt.tight_layout()
    plt.show()

def Plotly_Pie_Plot_1fact(Fact1,df,titler="",Value="Just_Ones"):
    df=df.dropna(subset=[Fact1])
    fig = px.pie(df, values=Value, names=Fact1, color=Fact1, title=titler,color_discrete_map = constant_color_assigner(df[Fact1]))
    fig.show()
    

"""Creates a Scatterplot from x and y values of dataframe, with no groupby"""
def Plotly_Scatter_Plot(Fact1,Value,df,text_hover="",Color_Titler="",titler="",slider_text=""):
    
    if slider_text == "":
        fig = px.scatter(df, x=df[Fact1], y=df[Value], color=df[Color_Titler]
                         , color_discrete_map = constant_color_assigner(df[Color_Titler])  
                         ,hover_data=[df[text_hover]],trendline="ols" )
    else:
        fig = px.scatter(df, x=df[Fact1], y=df[Value], color=df[Color_Titler],
                         animation_frame = df[slider_text],
                         category_orders={slider_text: list(df[slider_text].sort_values().unique())}
                         , color_discrete_map = constant_color_assigner(df[Color_Titler])  
                         ,hover_data=[df[text_hover]],trendline="ols" )
    #fig.add_traces(go.Scatter(x=df[Fact1], y=df[Value], name='Regression Fit'))
    # regression
    """
    reg = RidgeClassifier().fit(np.vstack(df[Fact1]), df[Value])
    df['bestfit'] = reg.predict(np.vstack(df[Fact1]))

    # plotly figure setup

    fig.add_trace(go.Scatter(name='line of best fit', x=df[Fact1], y=df['bestfit'], mode='lines'))
    """
    fig.update_layout(
    margin=dict(l=80, r=80, t=100, b=80),
    paper_bgcolor="LightSteelBlue",
        )
    
    fig.update_layout(title=titler   , xaxis_title=Fact1,yaxis_title=Value,
                            yaxis=dict(range=[min(df[Value]), max(df[Value])])) #Make sure the full dataframe is in so the visible axis is constant
    fig.update_xaxes(zeroline=True, zerolinewidth=2, zerolinecolor='Black')
    fig.update_yaxes(zeroline=True, zerolinewidth=2, zerolinecolor='Black')    
    fig.show()
    

##Annotated 2 dimensional Heatmap made with plotly
def Proportional_Instances_Category1_by_Category2_Heatmap(Category_To_Average,By_Category,DF_ORIG,Normalizer_string_toggle="mean"):
    

    df = pd.DataFrame(index=list(DF_ORIG[By_Category].unique()))
    for interest in list(DF_ORIG[Category_To_Average].unique()):
        Prop_Series = proportional_instances_of_column_series(By_Category,SMORE_DF[SMORE_DF[Category_To_Average]==interest],Normalizer_string_toggle=Normalizer_string_toggle)
        df2 = pd.DataFrame(Prop_Series)
        df2 = df2.rename(columns={By_Category:interest})
        df = df.join(df2)
    fig = ff.create_annotated_heatmap(x=df.columns.to_list(), y=df.index.to_list(), z=df.values, hoverinfo='z')
    #ig = go.Figure(data=go.Heatmap(df_to_plotly(df)))
    fig.update_layout(title="Percent of " + Category_To_Average + " Associated with each " + By_Category)
    fig.show()
    
#COnsistent color scheme
color_list_long = ['yellow','blue','green','red','orange','purple','magenta']


color_list_long= color_list_long + color_list_long
In [6]:
Education_HDI_csv_renamed['Percentage of Schooling Attained in Pop.']=100-Education_HDI_csv_renamed['Percentage of No Schooling Attained in Pop.']
In [7]:
#Plot those dataframe values on the world map
def World_Map_Plotly(df,color_column,slider_text="",titler=""):
    yaxiser=dict( range=[ min(df[color_column]), max(df[color_column])])
    if slider_text == "":

        fig = px.choropleth(df, locations="formal_en",
                            locationmode='country names',
                            color=color_column # lifeExp is a column of gapminder
                          
                            ,color_continuous_scale=px.colors.sequential.Plasma)
    else:
        fig = px.choropleth(df, locations="formal_en",
                            locationmode='country names',
                            color=color_column # lifeExp is a column of gapminder
                           
                            , animation_frame=slider_text,
                            category_orders={slider_text: list(df[slider_text].sort_values().unique())}
                            , range_color=[min(df[color_column]),max(df[color_column])])  
    fig.update_layout(title=titler)

    
    fig.show()

    

Initial Metrics on HDI

The below Plotly graphs are interactive, so feel free to play around with the sliders are filters, aswell as just hovering over the data for more information.

Unfortunately, some countries did not have any educational metrics associated with them, so were unfortunately not included in the analysis.

In [8]:
World_Map_Plotly(Education_HDI_csv_renamed,"HDI",slider_text="YEAR",titler="HDI of Each Country from 1980 to 2010")

Scrolling through the slider in the world map above, one can see that the HDI generally improves throughout the world, and that low and high indexed are generally clustered together. Because of this reality, the region will be taken in as an additional factor.

In [9]:
diff_Data=Difference_Between_Two_Factors_2010_1980_Plot_Maker('Average Years of Schooling Attained','HDI',
                                                    df=Education_HDI_csv_renamed
                                                    ,titler = "Difference in Human Development Index compared to <br> Difference in Percentage Who Attended Primary School for 15 year olds for Each Country between 1980 and 2010"
                                                  ,return_dataframe=True )

World_Map_Plotly(diff_Data,"HDI_Diff",titler="Difference in HDI of Each Country from 1980 to 2010")

In this map graph, the lighter colors are associated with greater strides in HDI growth. This growth is most seen in the Central, South and East Asian countries, while others have only sporatically seen high improvemnt in development. Those countries that already neared the highest possible ratings of HDI such as those in North America and Western Europe did not have far to climb and therefore did not chance much. What is more worrying is that much of the under-developed Central African countries did not appear to climb much at all either.

Educational Development Analysis

Spread of Education over Time

In [10]:
    
Group_by_two_factors_df_lineplot_maker('YEAR','region_wb',
                                       'Percentage of Schooling Attained in Pop.',
                                       df = Education_HDI_csv_renamed,
                                      titler="Average Schooling Attained in Population every 5 Years")

Looking at the above plot, one can see that all regions, it seems, have in the aggregate improved their countries access to schooling between 2018. South Asia improved quickly enough to where they are no longer the least educated region, while the Middle East and North Africa have doubled the percentage of their population that has some education in this timespan.

In [11]:
    
Group_by_two_factors_df_lineplot_maker('Children Born in Year','region_wb',
                                       'Percentage of Schooling Attained in Pop.',
                                       df = Education_HDI_csv_renamed,
                                      titler="Average Schooling Attained in Population for Children Born in that Year every 5 Years")

When looking at the rates of schooling for children born in each year over time, one can see that huge strides in the education of different populations comes at different times for different regions. For Middle Easterners, those born in 1940 are 3 times as likely not to have had any schooling as those born in 1970, an astouding improvement. For those in East Asia, the big leap came for those born in the 1940s, while South Asia just had a considerable leap for those born in the mid-eighties.

In [12]:
Group_by_two_factors_df_lineplot_maker('YEAR','region_wb',
                                       'Percentage of Schooling Attained in Pop.',
                                       df = Education_HDI_csv_renamed,
                                      titler="Average Schooling Attained in Population every 5 Years",
                                      slider_text="Starting Age")
I also thought it of interest to see the educational process of certain age groups in each region over time. When breaking down the chart from before with a slider that controls for age group, one can see the differences between the generations in access to education. Those who were 70 at each of the different years had far lower rates of schooling compared with the 15 year olds, with the exception of North America, which has had consistently high education.

Outside of just getting your population into school, it is also imperative that they go to school for many years, and thus the average years of school for each country would be of interest too.

In [13]:
diff_Data = Difference_Between_Two_Factors_2010_1980_Plot_Maker('Percentage of Schooling Attained in Pop.','HDI',
                                                    df=Education_HDI_csv_renamed
                                                    ,titler = "Difference in Human Development Index compared to <br> Difference in Percentage Who Attended Primary School for 15 year olds for Each Country between 1980 and 2010"
                                                  ,return_dataframe=True )
diff_Data= diff_Data.groupby(by=["region_wb","Starting Age"],  as_index=False).agg({"Percentage of Schooling Attained in Pop._Diff":"mean"})

Plotly_Bar_Plot(Fact1="region_wb"
                ,Value="Percentage of Schooling Attained in Pop._Diff"
                ,df=diff_Data,text_hover="region_wb",slider_text="Starting Age",Color_Titler="region_wb"
                ,titler="Average Difference across the Regions in Schooling Attained from 1980 to 2010"
                ,category_order=True,Sort_fact="region_wb" )

The chart above shows the total difference between each of the age groups after 30 years. So, the first point on the slider shows how much more educated those born in 1995 were than those born in 1965. We can see that South Asia has had the great impact on bringing education to it's 15-25 year old populations, the Middle East has had the most success educating it's 30-65 year olds, while East Asia brought the greatest change to it's 70 year olds. Recent conflicts in the Middle East and North Africa may explain why education for 2010's 15-year olds is not as drastically improved from 1980 compared to the rates of other age groups in the region. The same may be said of Sub-Saharan Africa.

In [14]:
diff_Data=Difference_Between_Two_Factors_2010_1980_Plot_Maker('Percentage of Schooling Attained in Pop.','HDI',
                                                    df=Education_HDI_csv_renamed
                                                    ,titler = "Difference in Human Development Index compared to <br> Difference in Percentage Who Attended Primary School for 15 year olds for Each Country between 1980 and 2010"
                                                  ,return_dataframe=True )

World_Map_Plotly(diff_Data,"Percentage of Schooling Attained in Pop._Diff"
                 ,titler="Difference in Schooling Rate of Each Country from 1980 to 2010",
                slider_text="Starting Age")

Plotting the same metrics on a world map, one can see how important region is in determing how much a country has progressed in education. 40 year olds in Africa and Asia appear to have taken relatively great strides in improving their education from 1980 to 2010. For the younger generations, however, South Asia and the Middle East have improved the most most recently.

Analysis of the Number of Years of Schooling

In [15]:
    
Group_by_two_factors_df_lineplot_maker('YEAR','region_wb',
                                       'Average Years of Schooling Attained',
                                       df = Education_HDI_csv_renamed,
                                      titler="Average Years of Schooling Attained in Population Every 5 Years")

    

The average years of schooling of each population was also thought to be of interest as a way of measuring educational progress. Across all ages, North America has the most average years of education afforded to it's populace, while Sub-Saharan Africa has had the least consistently. Having an average of 12 years completed of school would usually mean your average citizen would have completed secondary school.

In [16]:
    
Group_by_two_factors_df_lineplot_maker('Children Born in Year','region_wb',
                                       'Average Years of Schooling Attained',
                                       df = Education_HDI_csv_renamed,
                                      titler="Average Years of Schooling Attained in Population for Children Born in that Year every 5 Years")

The graph above tells a similar tale, with North America acheiving completion of secondary school of their average poputlation by 1950 then plateauing. Other countries are struggling to get to even completion of primary school for their average citizen. Whats clear in these graphs however is that there was a post-World War II acceleration of schooling for every region that is followd by a plateau.

In [17]:
Group_by_two_factors_df_lineplot_maker('YEAR','region_wb',
                                       'Average Years of Schooling Attained',
                                       df = Education_HDI_csv_renamed,
                                      titler="Average Years of Schooling Attained in Population Every 5 Years <br>Broken Down by Age Range"
                                      ,slider_text="Starting Age")

Breaking it down by age, one can see that the only region to see significant growth in average schooling for it's 15-19 year olds is South Asia post 1995. For 45 year olds in this period, however, there was a clear increase in their average years of education.

In [18]:
diff_Data = Difference_Between_Two_Factors_2010_1980_Plot_Maker('Average Years of Schooling Attained','HDI',
                                                    df=Education_HDI_csv_renamed
                                                    ,titler = "Difference in Human Development Index compared to <br> Difference in Percentage Who Attended Primary School for 15 year olds for Each Country between 1980 and 2010"
                                                  ,return_dataframe=True )
diff_Data= diff_Data.groupby(by=["region_wb","Starting Age"],  as_index=False).agg({"Average Years of Schooling Attained_Diff":"mean"})

Plotly_Bar_Plot(Fact1="region_wb"
                ,Value="Average Years of Schooling Attained_Diff"
                ,df=diff_Data,text_hover="region_wb",slider_text="Starting Age",Color_Titler="region_wb"
                ,titler="Average Difference across the Regions in Years of Schooling from 1980 to 2010"
                ,category_order=True,Sort_fact="region_wb" )

When aggregated across 1980 to 2010, South Asians have the highest average increase in years of schooling attained for it's 15 year olds. For most regions, their rates of years of schooling added peaked with their 45 year olds. This means that those born in 1960 were afforded much more education than those born in 1930 across the regions.

In [19]:
diff_Data = Difference_Between_Two_Factors_2010_1980_Plot_Maker('Average Years of Schooling Attained','HDI',
                                                    df=Education_HDI_csv_renamed
                                                    ,titler = "Difference in Human Development Index compared to <br> Difference in Percentage Who Attended Primary School for 15 year olds for Each Country between 1980 and 2010"
                                                  ,return_dataframe=True )
World_Map_Plotly(diff_Data,"Average Years of Schooling Attained_Diff"
                 ,titler="Difference in Schooling Rate of Each Country for each Age Group from 1980 to 2010",
                slider_text="Starting Age")

When looking at each country individually, one finds the world as a whole found the greatest improvement in that 30 year span of 1940 to 1970, adding the most years of education to it's 40 year olds, as the map turns very yellow at that value of the slider.

Now that educational metrics and HDI have been evaluated singularly, the relationship between the two will be investigated.

Comparison Between HDI and Education Markers

In [20]:
Education_HDI_csv_renamed_Age_15_Region_Sub_Africa=  Dataframe_Loc(Education_HDI_csv_renamed,Age_Val= 15 
                                        ,Region_Val="")
Plotly_Scatter_Plot('Percentage of Schooling Attained in Pop.','HDI'
                     ,Education_HDI_csv_renamed_Age_15_Region_Sub_Africa,text_hover="Country Name",
                    Color_Titler="region_wb", titler= "Human Development Index compared to Schooling for <br> 15 year olds for Each Country"
                   ,slider_text="YEAR")

Above is the relationship between the HDI and the percentage of population schooling metric for 15 year olds broken down by region. Though the number of countries on the chart necessitate that the graph is messy, the trendlines between the years and regions show that there is consistent positive relationship between schooling and HDI.

Very apparently one can see that as the years carry on, most countries increased the percentage of their population with schooling to near 100% by 2010, with the exception of most countries in Sub-Saharan Africa.

This is not to say Sub Saharan African countries haven't improved, as every country apparently moves drastically to the left as the years procede. However, the countries for other regions that have <80% Schooling and <.6 HDI are the exception, whereas for Sub-Saharan Africa it is the rule.

In [21]:
Education_HDI_csv_renamed_Age_15_Region_Sub_Africa=  Dataframe_Loc(Education_HDI_csv_renamed,Age_Val= 15 
                                        ,Region_Val="")
Plotly_Scatter_Plot('Average Years of Schooling Attained','HDI'
                     ,Education_HDI_csv_renamed_Age_15_Region_Sub_Africa,text_hover="Country Name",
                    Color_Titler="region_wb"
                    , titler= "Human Development Index compared to Average Years of Schooling  <br>for 15 year olds for Each Country in 2010"
                   ,slider_text="YEAR")

A similarly positive relationship is found between average years of schooling attained in a country and HDI. One can see that even as the years progress, the trends between the two variables remain moderately positive, except for the far developed regions of Europe and North America.

Now that we know there is a positive relationship between the variables, lets investigate whether countries that improved the most in education also improved significantly in their HDI using difference scatterplots.

Analysis of Countries that Have Improved the Most (Difference Graphs)

In [22]:
diff_Data=Difference_Between_Two_Factors_2010_1980_Plot_Maker('Percentage of Schooling Attained in Pop.','HDI',
                                                    df=Education_HDI_csv_renamed
                                                    ,titler = "Difference in Human Development Index compared to <br> Difference in Percentage Who Attended Primary School for 15 year olds for Each Country between 1980 and 2010"
                                                  ,return_dataframe=True )

Plotly_Scatter_Plot(Fact1="HDI_Diff"
                ,Value="Percentage of Schooling Attained in Pop._Diff"
                ,df=diff_Data,text_hover="Country Name",slider_text="Starting Age",Color_Titler="region_wb"
                ,titler="Average Difference across the Regions in Schooling Attained from 1980 to 2010")

In this graph, the further a country is within the 1st quadrant, the better, as that means they have increased their HDI and percentage of schooling attained in their population. Noticibly, the DRC is the only country that now has a lower HDI and less 15 year olds attending school then had in 1980. Nepal had made the greatest strides in educating their 15 year olds, while Cambodia had made the greatest strides in improving their HDI.

This graph tells us that when a country has improved greatly in HDI, it does not improve to the same degree for it's percentage of schooling attained, at least when compared to the inherent correlation between the variables. In other words, a country that invested a lot in getting more children to go to school did not see that increase result in higher development for their country.

In [23]:
diff_Data=Difference_Between_Two_Factors_2010_1980_Plot_Maker('Average Years of Schooling Attained','HDI',
                                                    df=Education_HDI_csv_renamed
                                                    ,titler = "Difference in Human Development Index compared to <br> Difference in Percentage Who Attended Primary School for 15 year olds for Each Country between 1980 and 2010"
                                                  ,return_dataframe=True )

Plotly_Scatter_Plot(Fact1="HDI_Diff"
                ,Value="Average Years of Schooling Attained_Diff"
                ,df=diff_Data,text_hover="Country Name",slider_text="Starting Age",Color_Titler="region_wb"
                ,titler="Average Difference across the Regions in Years of Schooling Attained from 1980 to 2010")

Here, the 1st quadrant is the ideal quadrant to reside in, as it shows that your average years of schooling have gone up along with your HDI. Once again, the DRC is the only country in both negative quadrants.

One can see that as the ages increase, there is apparently only very weak relationships between the change in average years of schooling and HDI. This shows that increasing the years of schooling has only recently started correlating with having a more developed country.

All this analysis from the previous two graphs suggest that countries that made huge strides in educating their people did not necessarily see that work reflect in the HDI of their countries.

Conclusion

This analysis shows that some countries and regions have certainly taken bigger strides than others in developing the education and quality of life of their population. Regions like the Middle East, North Africa, and Asia are certainly far more developed in both sectors than they were in 1980. Sub Saharan Africa, while it has had many success storys since 1980, still has a long way to go.

In terms of whether education is the driving force to furthering the development of the people, the results are not clear. There is certainly an association between the two factors, but it is often weak, and varies depending on the age group and region of interest.

So, though increasing the educational capabilities of the state may be one goal that a state would have in increasing their HDI, it would also be ideal to pursue this through other means aswell.

Learning Processes

With this project, I learned a lot about the plotly package and it's capabilities. I developed a lot of my graphing skills, learning how to work with sliders, and I also found a lot out about different ways to splice and filter pandas dataframes.

Zenodo and Github